-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash during normalization of time zones #1407
base: master
Are you sure you want to change the base?
Fix crash during normalization of time zones #1407
Conversation
077cc8e
to
2b8dd29
Compare
@@ -1040,7 +1054,12 @@ def _custom_pack(self, obj): | |||
def _ext_hook(self, code, data): | |||
if code == MsgPackSerialization.PD_TIMESTAMP: | |||
data = MsgPackNormalizer._nested_msgpack_unpackb(data) | |||
return pd.Timestamp(data[0], tz=data[1]) if data[1] is not None else pd.Timestamp(data[0]) | |||
val, tz, offset = data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be wise to keep using the numeric indexing so that we can continue to extend if needed.
Also this needs to be able to read old data which only had two fields. So we should add a test that covers the legacy case as well.
if len(data) == 2:
val, tz = data
# old method...
else:
val, tz, offset = data[:3]
# new methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I had forgotten to push these changes.
As for testing the legacy case, we can do it only with the compatibility tests and I have added the needed changes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the compatibility test change result in a test that exercises the if len(data)==2:
block?
if tz: | ||
# We need to convert the string tz to a timedelta to denormalize to a datetime.timezone | ||
obj = pd.Timestamp(val, tz=tz) | ||
offset = obj.tzinfo.utcoffset(obj).total_seconds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests for this case, both asserting zero and non-zero offset please.
b26cd53
to
6b24588
Compare
This PR does change the stored and returned timezone data. We should likely hold off for a major version, just as with #1410 |
641d6ab
to
af479a5
Compare
Add another chak for the tests Address PR comments and add more test cases Fix edge case on Windows Fix broken test on windows Recreate the timestamp with timezone Coerce all timezone to datetime.timezone Add metadata to compat tests Add meta to compat tests Add a check for the buff len from denormalizing timestamps Add a note in the docs about using the time zone info to create other time zone types Add comment about the old PD_TIMESTAMP normalization Handle case when we have a valid tz from a past version Fix tz in meta in backwards compat test Fix install of old protobuf version Remove outdated assert Add 0 offset case Add write of meta data Fix interop test Add metadata to append Rebase Fix
af479a5
to
1e2deb8
Compare
Reference Issues/PRs
Fixes #1406
What does this implement or fix?
Converts time zone data to string when normalizing with MsgPack.
Adds tests to make sure that the crash doesn't reproduce in the future.
Any other comments?
Checklist
Checklist for code changes...